<html lang="en"><head><style>body {
   color: black;
}
</style><title>Data Biography for Harvard Dataverse Entry: State Public Health Agency, CDC, and FDA Tweets (2012 through 2022)</title></head>
<body>
<header>
<h1 id="data-biography-for-harvard-dataverse-entry-state-public-health-agency-cdc-and-fda-tweets-2012-through-2022-">Data Biography for Harvard Dataverse Entry: &quot;State Public Health Agency, CDC, and FDA Tweets (2012 through 2022)&quot;</h1>
<p>Author: Samuel R. Mendez.<br/>Date: April 9, 2023.</p>
</header>
<h2 id="intro">Intro</h2>
<p>This data biography describes the contextual factors shaping the creation of the data set available at the <a href="https://doi.org/10.7910/DVN/VX4HK8">Harvard Dataverse entry &quot;State Public Health Agency, CDC, and FDA Tweets (2012 through 2022)&quot;</a>.</p>
<main>
<h2 id="what">What</h2>
<p>The data set is a collection of Tweet IDs (n=690281) of Tweets by accounts managed by state public health agencies, state-level pandemic response efforts, and CDC- or FDA-affiliated accounts I deemed relevant to pandemic communications, from 2012 through 2022. It is available in a TXT format, with each ID on its own line. It is also available in CSV and RData formats, along with a custom variable manually created to indicate the state served or federal agency affiliation of the governmental agency author. Along with the data set itself, I have shared the RMD file containing the scripts I used to create the data set. Note that the data set&#39;s contents were contigent upon availability of data through the Twitter API for Academic Research (v2) on April 9, 2023. Factors including Twitter API access policy, author account settings, and individual Tweet settings all shape data availability and may change over time. Thus attempts to hydrate the Tweet IDs or recreate the data set creation process may lead to different outputs over time.</p>
<h2 id="why">Why</h2>
<p>I created this data set as part of my dissertation research as a PhD candidate in the Department of Social and Behavioral Sciences at the T.H. Chan School of Public Health. I planned to use subsets of this data set for two streams of work. One is focused on testing the feasibility of supervised machine learning to predict human scoring of public health agency Tweets via the CDC Clear Communication Index. The other is focused on integrating health literacy metrics with natural language processing techniques under a media studies-informed theoretical framework. As such, my interest in public health agency Tweets comes from research interests in organizational health literacy and health communication. The relative ease of collecting Twitter data compared to other online communications made it stand out to me as a good site to develop these methods. While it is only one platform, however, I also see value in Twitter data as a prominent professional communication tool in public health, journalism, and academic research. For example, many health agencies and health professionals use Twitter and follow each other, and tweets are commonly included in online news reporting. As a communication researcher myself, I do not typically encounter Facebook posts, Tiktok videos, newsletters, or other online communication channels as a visible site of professional engagement in this way.</p>
<p>In creating this data set, I wanted to give myself flexibiltiy in this second stream by creating a data set coveirng a larger timeframe of Tweets than I initially intended to explore. Though I knew my dissertation research would not go as far back as 2012, I recognized this as an opportunity to help enable more health communication research. The hardest part of this process was manually finding the state public health agency Twitter accounts and writing scripts to work with the results of Twitter API calls. With that work done, the process of expanding the time frame of my API calls was relatively fast and easy. This data set and these scripts will likely help enable my own future research proejcts, and I wanted to share them to help enable other researchers&#39; projects as well. In my opinion as a health communication researcher, I find that the use of social media by state-level public health agencies is understudied. As such, beyond research transparency and reproducibility, I wanted to share this data set as a small way of contributing to the field.</p>
<h2 id="who">Who</h2>
<p>I, Samuel R. Mendez (they/them), created this data set and the accompanying data descriptors and R scripts. At time of writing, I am a third-year PhD candidate in the Department of Social and Behavioral Sciences at the Harvard T.H. Chan School of Public Health. My dissertation committee consists of Dr. Karen Emmons, Dr. K. Viswanath, and Dr. Sebastian Munoz-Najar Galvez. I manually searched for the public health agency Twitter handles and used them to create Twitter API (v2) queries. I wrote the script to carry out the data download and processing, and I created the documentation available on Harvard Dataverse.</p>
<p>The Tweets themselves were authored by public health agency accounts. Each state and federal agency has its own process for social media use and its own internal organizational structure. As such, the identity of individual authors/editors is unclear, as is the intended relationship between material produced for Twitter and material produced for other communication channels. For example, some agencies may have a team dedicated to social media communication, while others may distribute this responsibility alongside other forms of media outreach or community engagement. The exact relationship of Twitter usage to the rest of an agency&#39;s communication streategy across the country is a worthy research question in its own right, and is not explored in this data set. The resources available for social media communication in public health agencies across the country is also a worthy research question in its own right, and is not explored in this data set. That being said, I claim that I can reasonably assume that Twitter, as a prominent professional communication tool, can provide insight into a public health organizations overall professional communication priorities.</p>
<h2 id="when">When</h2>
<p>I first collected Twitter handles and verification data manually in the summer of 2022. On April 8, 2023, I manually revisited the pages on Twitter again to verify and update the information as needed, particulary the verification status coding, which I thought was likely to change alongside Twitter account verification policy changes. On April 9, 2023, I executed the scripts in the RMD file associated with this Dataverse entry to produce the data set.</p>
<h2 id="where">Where</h2>
<p>I completed this work online in the United States.</p>
<h2 id="how">How</h2>
<p>I used Google, public health agency websites, and the Twitter search bar to identify the accounts associated with state public health agencies and state-level pandemic response efforts. I also visited CDC and FDA websites with listing of agency-affiliated accounts to identify those I deemed relevant to pandemic communications.</p>
<p>I accessed the data via the Twitter API for Academic Research (v2). I used the R package &quot;academictwitteR&quot; (version 0.3.1), in R (version 4.2.3), in R Studio (2022.12.0+353 &quot;Elsbeth Geranium&quot; Release for Windows) to write and execute the scripts to do so.</p>
</main>
</body></html>